fix: honor Hygon DCU SLA topk environment setting#1197
Conversation
(cherry picked from commit e8ee93a79bd20dce2d084e992a8e140710f2c9b6)
There was a problem hiding this comment.
Code Review
This pull request updates the sparse attention configuration in flash_attn.py by changing the default fallback value of the SPARSE_ATTN_TOPK environment variable to 0.4 and dynamically passing topk_value instead of a hardcoded value. The review feedback recommends adding validation and error handling when parsing SPARSE_ATTN_TOPK to prevent potential runtime crashes from invalid float values or values outside the expected range.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # Use Flash Attention 2.6.1 (ROCm version) with varlen interface | ||
| if SAPRDE_LINEAR_ATTN and int(os.getenv("USE_SLA", 0)) and q.shape[1] == k.shape[1]: | ||
| topk_value = float(os.getenv("SPARSE_ATTN_TOPK", "0.5")) | ||
| topk_value = float(os.getenv("SPARSE_ATTN_TOPK", "0.4")) |
There was a problem hiding this comment.
Parsing the environment variable SPARSE_ATTN_TOPK directly into a float without validation can lead to runtime crashes (if the value is not a valid float) or unexpected behavior/crashes in the underlying sparse attention kernel (if the value is outside the valid range of (0.0, 1.0]). It is safer to wrap this in a try-except block and validate that the resulting value is within the expected range.
try:
topk_value = float(os.getenv("SPARSE_ATTN_TOPK", "0.4"))
if not (0.0 < topk_value <= 1.0):
raise ValueError
except ValueError:
logger.warning("Invalid SPARSE_ATTN_TOPK value. Falling back to default 0.4.")
topk_value = 0.4|
We will check this pr. And please pay attention to the code format:
|
|
Do not use environment variables to control the sparsity of attention, especially since this switch only works under hygons. You can write two separate attention classes. See: https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/flash_attn.py https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/sla_attn.py |
Thanks for the feedback. This PR mainly fixes an issue where topk does not take effect in the Hygon DCU SLA attention path. The current SLA interface uses a hardcoded topk value instead of the value from SPARSE_ATTN_TOPK. |
Summary
SPARSE_ATTN_TOPKWhy
This fixes an inconsistency where the Hygon DCU fallback path used a different top-k default than the optimized runtime path.
Validation
ModelTC/LightX2V:main(89dfa833)git diff --checkpassed for the PR branch